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Abstract- A cellular logic array is described for squaring binary numbers. This 
array offers a significant increase in speedy with a relatively small hardware 
overhead. This improvement is a result of novel implementation of the formula 
(x + y ) 2 - x 2 +y 2 ■f 2 xy. These results can also be incorporated in the existing 
arrays achieving considerable hardware reduction. 


1 Introduction 

The advent of VLSI has spurred a renewed interest in the development of specialized 
arithmetic circuits. Special arithmetic functions like squares and square-roots are generally 
implemented in software. However, when a machine is designed for a specific application, 
wherein squaring is a frequent process, it may prove advantageous in terms of speed to use a 
hardware implementation. Most of the approaches, reported in literature for squaring and 
square- rooting, use array multipliers or special purpose arrays which perform a multitude 
of other operations in addition to squaring. As a result, there are very few arrays which are 
solely devoted to extraction of squares. However, Dean[l] has reported such a dedicated 
array which is probably among one of the fastest squaring circuits known, thus far. In 
addition, Dean’s array uses considerably less hardware than other arrays reported so far. 
Hence Dean’s array has been selected as the obvious choice for comparison with the array 
proposed in this paper. The proposed array, will provide a significant gain in speed, with 
a very small hardware overhead, as compared to Dean’s squarer[l]. 


2 Algorithm 

Dean[l] has not presented a formal algorithm for his implementation. So, the widely 
used general binary squaring algorithm[3] will be presented first followed by the proposed 
algorithm for purposes of clarity and easy understanding. The existing algorithm for binary 
squaring is generally formulated as follows: 

(l) 2 = (01)i 

(ail) 2 = (ai) 2 + (OaiOl)b or 

1 This research was supported ( or partially supported ) by NASA under Space Engineering Research 
Center Grant NAGW-1406. 



2 . 4.2 


F J = F 1 + (0a x 01) fr 

where Fi = (01 )s if ax = 1 and F% = (00)& otherwise. Similarly, we have 

(a 3 oil) 2 = (a 2 ai) 2 4- (00a 3 ax0l)6, or 


In general if a, +1 = 1 then, 


F 3 — Fj (00020x01)6 
F r + 1 = F r + D f 


t time # 

where F,=(o r a r _ l ...a 3 ai) 2 is the r th square and D r — O0....Oa r a r _x ...oxOl is called 
the r radicand. It is obvious that F r+ j = F r if q T +i = 0. The aboye iterative formula 
applies for all r~= T, 2, . . . ,n. Fi gures 4 and 5 show the schematic details of a three bit 

squaring array for the above algorithm[3]. - : - 

The proposed algorithm makes use of the well known formula (x + y) 2 = x 2 + y 2 + 2 xy. 
Consider a three bit number (o 3 2 2 + a 1 2 1 + o 0 2°). The LSfi-1 and LSB of the square of 
any number will respectively be 0 and LSB of the original number itself. Therefore, 


(a 3 2 + oj2 + Oo2 0 ) 2 — (a 3 + ox)2 4 + (a 3 ao)2 3 4- (axOo 4- Oo)2 J + & q . 

The same result can also be achieved by the repeated application of the formula ( x + 
y) 2 = x 2 + y 2 + 2 xy where y is the LSB and x is the rest of the binary number. 


(02^+ oj2^) 2 = (022 1 ) 2 4- 2 (o 3 ox2 1 ) + 0x2° 

x V x 2 2 xy y 2 

= ( ^2 )2 2 + ( 02 ^ 1 ) 2 ^ + CL\ 2° 

= (a 3 4 o 3 Ox) 2 2 + 0x2° (1) 

Also, 


(o 3 2 2 + 0x2' + o 0 2 0 ) 2 = (o 3 2 2 + OX2 1 ) 2 + 2(a 3 Oo2 2 + + o«2 0 

v s V s ' 

x V 3.2 2 xy y 2 

~ (^ 22 l + ai2°) 2 2 3 + (a 2 a 0 2 3 -(- aiao2 2 ) + clq 2 ° (2) 

Equation 1 proves that the LSB-1 bit and the LSB of the final answer is always 0 
and the LSB of the original number itself respectively. Since multiplication by 2 implies 
a left-shift by one bit position the term (2a 3 ox) has been shifted from the 2 1 bit position 
to 2 2 bit position in Equation 1. This result for a three bit binary number is realized by 
the array of Figure 1, The algorithm can easily be extended to any n bit number. The 
novelness of the algorithm lies in the fact that squaring of the number is carried out in 
steps coupled with the ingenious use of left-shifts in the bit positions. 


imiiiiiimiiiiilh'llll 
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3 Comparison 

The implementation of the proposed algorithm for a 3 bit and a 4 bit number has been 
illustrated in Figures 1 and 3 respectively. The proposed array is built of the basic half- 
adder cell shown in Figure 2. Its function may be defined as follows: 
u = (to + v_i) © (xy) 
v = (w + v_i) • (xx/) 

The symbols + and • stands for the Inclusive-Or and And operations in the above expres- 
sions. 

The implementation of 3 bit squarer based on Dean’s algorithm is also illustrated in 
the Figures 6 and 7. The basic cell (Figure 7) has two control inputs A and B. The inputs 
on the fines C and D are added in the cell, S being the sum out and P being the carry 
out. When both A and B are present, a further digit is added to the sum (and carry), so 
that the cell then functions as a full-adder[l]. 

It can be seen that the proposed array has 1 + 52" =3 * whereas Dean’s array [1] uses 
1 + - i cells resulting in a overhead of (n — 2) cells. However, the hardware inside the 

proposed basic cell is much simpler, as it utilizes only half-adders, compared to full-adders 
in Dean’s array. So the increase in the number of cells is offset by the reduction in the 
complexity of the individual cell. This leads to the authors contention that the hardware 
overhead which translates into increased chip area is almost negligible. Moreover, the 
propagation time through the proposed array is only nr as compared to (2n — 3)r which 
is the delay through Dean’s array. The hardware overhead-speed gain relation follows the 
square law for most specialized arithmetic arrays. Here, an increase in speed has been 
accomplished with a linear increase in hardware. 

The proposed array has a number of unused inputs which can be used to add in an 
other number so that the array would function as a full squarer (all outputs in 1 state). 
A specialized array of this sort has a number of applications including the generation of 
binary logarithms [2] which depends on iterative squaring. 

4 Conclusions 

A new cellular array for extraction of squares of binary numbers has been presented. An 
squaring algorithm based on the formula (x + y) 2 has been described. The proposed array 
provides impressive speed gains compared to the existing arrays at the expense of negligible 
hardware overhead. It is hoped, that the algorithm discussed in this paper will provide 
fresh insights, to reduce redundant hardware present in most of the existing squaring 
arrays. 
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Figure 2: Basic cell used in the proposed squaring array 
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Figure 4: A three bit squaring array using the general algorithm 









'igure 6: Dean’s array for t3 
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